High Speed Hashing for Integers and Strings

نویسنده

  • Mikkel Thorup
چکیده

The concept of truly independent hash functions is extremely useful in the design of randomized algorithms. We have a large universe U of keys, e.g., 64-bit numbers, that we wish to map randomly to a range [m] = {0, ...,m− 1} of hash values. A truly random hash function h : U → [m] assigns an independent uniformly random variable h(x) to each key in x. The function h is thus a |U |-dimensional random variable, picked uniformly at random among all functions from U to [m]. When h has been picked, it is no longer random, so h(x) is fixed for all keys x ∈ U . Unfortunately truly random hash functions are idealized objects that cannot be implemented. More precisely, to represent a truly random hash function, we need to store at least |U | log2m bits, and in most applications of hash functions, the whole point in hashing is that the universe is much too large for such a representation (at least not in fast internal memory). The idea is to let hash functions contain only a small element or seed of randomness so that the hash function is sufficiently random for the desired application, yet so that the seed is small enough that we can store it when first it is fixed. In these notes we will discuss some basic forms of random hashing that are very efficient to implement, and yet have sufficient randomness for some very important applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

String hashing for linear probing

Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where the standard implementation of 2-universal hashing leads to an expected number of Ω(log n) probes....

متن کامل

Fast and Compact Hash Tables for Integer Keys

A hash table is a fundamental data structure in computer science that can offer rapid storage and retrieval of data. A leading implementation for string keys is the cacheconscious array hash table. Although fast with strings, there is currently no information in the research literature on its performance with integer keys. More importantly, we do not know how efficient an integer-based array ha...

متن کامل

Markov Chain Monte Carlo for Arrangement of Hyperplanes in Locality-Sensitive Hashing

Since Hamming distances can be calculated by bitwise computations, they can be calculated with less computational load than L2 distances. Similarity searches can therefore be performed faster in Hamming distance space. The elements of Hamming distance space are bit strings. On the other hand, the arrangement of hyperplanes induce the transformation from the feature vectors into feature bit stri...

متن کامل

On Strings of Consecutive Integers with No Large Prime Factors

We investigate conditions which ensure that systems of binomial polynomials with integer coefficients are simultaneously free of large prime factors. In particular, for each positive number ", we show that there are infinitely many strings of consecutive integers of size about n, free of prime factors exceeding n, with the length of the strings tending to infinity with speed log log log log n. ...

متن کامل

Perfect Hashing for Strings: Formalization and Algorithms

Numbers and strings are two objects manipulated by most programs. Hashing has been well-studied for numbers and it has been eeective in practice. In contrast, basic hashing issues for strings remain largely unex-plored. In this paper, we identify and formulate the core hashing problem for strings that we call substring hashing. Our main technical results are highly eecient sequential/parallel (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1504.06804  شماره 

صفحات  -

تاریخ انتشار 2014